An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

نویسندگان

Stacy T Knutson

Brian M Westwood

Janelle B Leuthaeuser

Brandon E Turner

Don Nguyendac

Gabrielle Shea

Kiran Kumar

Julia D Hayden

Angela F Harper

Shoshana D Brown

John H Morris

Thomas E Ferrin

Patricia C Babbitt

Jacquelyn S Fetrow

چکیده

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx super...

متن کامل

Signal processing approaches as novel tools for the clustering of N-acetyl-Î²-D-glucosaminidases

Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...

متن کامل

Protein-Protein Interaction Analysis of Common Top Genes in Obsessive-Compulsive disorder (OCD) and Schizophrenia: Towards New Drug Approach

Comorbidty is common among psychiatric disorders including obsessive-compulsive disorder and schizophrenia with a high rate. Many studies suggested that the disorders may have same etiological bases. In this regard, shared pathways of glutamate, dopaminergic, and serotonin are the known ones. Here, the common significant genes are examined to understand the possible molecular origin of the diso...

متن کامل

Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, th...

متن کامل

Bayesian search of functionally divergent protein subgroups and their function specific residues

MOTIVATION The rapid increase in the amount of protein sequence data has created a need for an automated identification of evolutionarily related subgroups from large datasets. The existing methods typically require a priori specification of the number of putative groups, which defines the resolution of the classification solution. RESULTS We introduce a Bayesian model-based approach to simul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 26 شماره

صفحات -

تاریخ انتشار 2017

An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

نویسندگان

چکیده

منابع مشابه

An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

Signal processing approaches as novel tools for the clustering of N-acetyl-Î²-D-glucosaminidases

Protein-Protein Interaction Analysis of Common Top Genes in Obsessive-Compulsive disorder (OCD) and Schizophrenia: Towards New Drug Approach

Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

Bayesian search of functionally divergent protein subgroups and their function specific residues

عنوان ژورنال:

اشتراک گذاری